Social intelligence architecture using social media message queues

ABSTRACT

A social intelligence system is presented that streams information from a source, queues the streamed information, analyzes/scores the queued data, and stores the analyzed/scored data in an analysis database. The analyzed/scored data can then be retrieved from the database for post-processing and stored in a client specific database for further reporting. By streaming the data into various message queues and scoring the data before storing in the analysis database, large volumes of data can be efficiently processed and analyzed for a particular person and/or entity.

RELATED APPLICATIONS

This application is a continuation of U.S. patent application Ser. No.17/381,275 filed Jul. 21, 2021, which is a continuation of U.S. patentapplication Ser. No. 15/237,138 filed Aug. 15, 2016 (now U.S. Pat. No.11,086,885 issued on Aug. 10, 2021), which is a continuation of U.S.patent application Ser. No. 13/465,307 filed May 7, 2012 (now U.S. Pat.No. 9,418,389, issued on Aug. 16, 2016), the entire contents of each ofwhich are incorporated herein by reference.

BACKGROUND

Information may be conveyed through a variety of mediums. For decades,printed media such as newspapers and/or magazines were used to conveyinformation to a multitude of people. Now, the Internet makes itpossible to convey information through many forms of electronic media.

Web-sites are a common way to convey information over the Internet.Additional forms of electronically conveying information have developedand taken off in the past few years. Blogs and social media/networkingweb-sites allow users less proficient with computers to log-on andconvey their opinion about people, companies, products, or anything thatmay be of interest.

As easily as information can be transmitted over the Internet, the verysame information can be monitored and analyzed. Sentiment analysistechnology analyzes information presented in both electronic andnon-electronic media in order to help determine a particular viewpointor opinion about a particular topic.

For example, an individual may log on to Facebook® and post a messageabout how much they like the new Nintendo 3DS®. Sentiment analysistechnology can retrieve and store the user's post and determine that theuser's sentiment towards the Nintendo 3DS® is positive.

Current sentiment analysis technology typically acquires mass volumes ofsocial media data and stores the data in a relational database. The datacan later be retrieved for sentiment analysis, and individuals and/orcorporations can determine the overall sentiment about a topic based onthe collected and analyzed data.

Unfortunately, these systems do not efficiently process the mass volumesof data that must be collected and organized in a relational databasebefore sentiment analysis can be performed and before the data can becustomized for a particular person and/or entity. Thus, there is a needfor a system that quickly and efficiently analyzes large volumes ofsocial media data for sentiment and stores it in a database forpost-analysis by a particular individual and/or entity.

BRIEF SUMMARY

A social intelligence system is presented that streams information froma source, queues the streamed information, analyzes/scores the queueddata, and stores the analyzed/scored data in an analysis database. Theanalyzed/scored data can then be retrieved from the database forpost-processing and stored in a client specific database for furtherreporting. By streaming the data into various message queues and scoringthe data before storing in the analysis database, large volumes of datacan be efficiently processed and analyzed for a particular person and/orentity.

A method for analyzing social media data in an information processingapparatus having one or more processors includes receiving, using a datatransmission device, social media data streams from one or more socialmedia sources as social media data segments. The method furthercomprises queuing the received social media data segments into one ormore social media message queues, the one or more social media messagequeues buffering the social media data segments as the segments arestreamed from the one or more social media sources, scoring the bufferedsocial media data segments based upon one or more predefined factors,performing, using the one or more processors, a sentiment analysis foran entity using the social media data segments and a score associatedwith the social media data segments, and storing the social media datasegments and their associated score of the data segment into an analysisdatabase.

A non-transitory computer-readable storage medium havingcomputer-readable code embodied therein which, when executed by acomputer having one or more processors, performs the method foranalyzing social media data according to the preceding paragraph.

Another aspect of the technology relates to a social intelligenceapparatus that includes a memory having one or more social media messagequeues and one or more processors, coupled to the memory, configured toexecute social media data analysis in the social intelligence apparatus.The one or more processors are also configured to receive, using a datatransmission device, social media data streams from one or more socialmedia sources as social media data segments, queue the received socialmedia data segments into one or more social media message queues, theone or more social media message queues buffering the social media datasegments as the segments are streamed from the one or more social mediasources, score the buffered social media data segments based upon one ormore predefined factors, perform, using the one or more processors, asentiment analysis for an entity using the social media data segmentsand a score associated with the social media data segments, and storethe social media data segments and their associated score of the datasegment into an analysis database.

Another aspect of the technology relates to a social intelligence systemthat includes one or more social media devices and a social intelligenceapparatus. The one or more social media devices have a memory configuredto store social media data, one or more processors configured to processsocial media data, and a transceiver configured to transmit social mediadata. The social intelligence apparatus has a memory having multiplesocial media message queues, a transceiver configured to receive socialmedia data from the one or more social media devices, and one or moreprocessors, coupled to the memory, and configured to execute socialmedia data analysis in the social intelligence apparatus. The one ormore processors are also configured to receive, using a datatransmission device, social media data streams from one or more socialmedia sources as social media data segments, queue the received socialmedia data segments into one or more social media message queues, theone or more social media message queues buffering the social media datasegments as the segments are streamed from the one or more social mediasources, score the buffered social media data segments based upon one ormore predefined factors, perform, using the one or more processors, asentiment analysis for an entity using the social media data segmentsand a score associated with the social media data segments, and storethe social media data segments and their associated score of the datasegment into an analysis database.

In a non-limiting, example implementation the social media data segmentsare parsed and scored using a natural language processor, and theresults of the sentiment analysis are stored in an entity specificdatabase.

In another non-limiting, example implementation the social media datasegments are analyzed for a positive, neutral, or negative sentiment ofan author of the social media data segment.

In yet another non-limiting, example implementation the multiple socialmedia sources comprises at least one of publications, social mediawebsites, forums, blogs, radio broadcasts, and/or television broadcasts.

In another non-limiting, example implementation social media datasegments and their associated scores are stored in a historical databasefrom the analysis database after a predetermined period of time.

In yet another non-limiting, example implementation the one or moresocial media message queues comprise high speed memory capable ofquickly buffering and accessing the social media data segments.

In another non-limiting, example implementation a sentiment report isgenerated based on the performed sentiment analysis for the entity.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 depicts a block diagram of an example embodiment of a socialintelligence apparatus;

FIG. 2 is a block diagram of an example embodiment showing furtherdetail of the social intelligence apparatus;

FIG. 3 is a block diagram of an example embodiment of the relationshipbetween the components of the social intelligence apparatus;

FIG. 4 is a diagram of an example embodiment of a social intelligencesystem;

FIG. 5 is an application flowchart of an example embodiment of a flow ofprocesses that can be implemented by the social intelligence apparatus;and

FIG. 6 is an application flowchart of an example embodiment of a flow ofprocesses for determining whether certain data should be stored in thehistoric databases.

DETAILED DESCRIPTION OF THE TECHNOLOGY

In the following description, for purposes of explanation andnon-limitation, specific details are set forth, such as particularnodes, functional entities, techniques, protocols, standards, etc. inorder to provide an understanding of the described technology. It willbe apparent to one skilled in the art that other embodiments may bepracticed apart from the specific details described below. In otherinstances, detailed descriptions of well-known methods, devices,techniques, etc. are omitted so as not to obscure the description withunnecessary detail. Individual function blocks are shown in the figures.Those skilled in the art will appreciate that the functions of thoseblocks may be implemented using individual hardware circuits, usingsoftware programs and data in conjunction with a suitably programmedmicroprocessor or general purpose computer, using applications specificintegrated circuitry (ASIC), and/or using one or more digital signalprocessors (DSPs). The software program instructions and data may bestored on computer-readable storage medium and when the instructions areexecuted by a computer or other suitable processor control, the computeror processor performs the functions. Although databases may be depictedas tables below, other formats (including relational databases,object-based models, and/or distributed databases) may be used to storeand manipulate data.

Although process steps, algorithms or the like may be described orclaimed in a particular sequential order, such processes may beconfigured to work in different orders. In other words, any sequence ororder of steps that may be explicitly described or claimed does notnecessarily indicate a requirement that the steps be performed in thatorder. The steps of processes described herein may be performed in anyorder possible. Further, some steps may be performed simultaneouslydespite being described or implied as occurring non-simultaneously(e.g., because one step is described after the other step). Moreover,the illustration of a process by its depiction in a drawing does notimply that the illustrated process is exclusive of other variations andmodifications thereto, does not imply that the illustrated process orany of its steps are necessary to the invention(s), and does not implythat the illustrated process is preferred.

Various forms of computer readable media/transmissions may be involvedin carrying data (e.g., sequences of instructions) to a processor. Forexample, data may be (i) delivered from RAM to a processor; (ii) carriedover any type of transmission medium (e.g., wire, wireless, optical,etc.); (iii) formatted and/or transmitted according to numerous formats,standards or protocols, such as Ethernet (or IEEE 802.3), SAP, ATP,Bluetooth, and TCP/IP, TDMA, CDMA, 3G, etc.; and/or (iv) encrypted toensure privacy or prevent fraud in any of a variety of ways well knownin the art.

Social media and traditional media (e.g., print media, online media,broadcast media) are very useful for corporate marketing purposes. Thetechnology described below relates to monitoring, analyzing, andevaluating media data, including social media data.

In an example implementation, unfiltered and pre-filtered data can bepulled from external sources through Simple Object Access Protocol(SOAP) XML requests, for example. It should be appreciated thatpre-filtered data can relate to the apparatus avoiding unnecessaryvolumes of irrelevant social media data from sources through theapplication of specified keywords and/or phrases or through theapplication of terms to be excluded. Thus, only materials that passpre-filtering will be passed through as data that can be pulled fromexternal sources.

The feed data is stored in message queues where scheduling services canretrieve the data from the message queues. Upon retrieving the queueddata, the data is parsed and analyzed using a Natural Language Processor(NLP), where the analyzer can pick up positive, negative, or neutralsentiment from the analyzed data. Based on the sentiment, a sentimentscore can be assigned and associated with the data. The analyzed dataalong with its associated score can then be stored in a database.

As explained below, using message queues to process large volumes ofsocial media data (e.g., Twitter, Facebook) allows for the automatedanalysis of large volumes in near real time. Data can be processedwithin minutes of its availability from a service feed provider and madeavailable for a client of the system to include in the client'sanalysis.

The technology described herein enables the processing of vast volumesof social media data with sentiment analysis to determine tonalitywithin a piece of social media data. For example, data can be scoredwith integer values or real, non-integer values (e.g., between 0.02 andany positive integer (+1)) where the value of 0 is normally reserved forNULL instances. The tonality (e.g., aggregation of sentiments from data)can include at least one of a negative, neutral, and/or positivesentiment and also a factual and/or NULL sentiment. These sentiments canbe assigned the following values: positive sentiment=+N, neutralsentiment=0.02N, negative sentiment=−N, factual sentiment=99N, andNULL=0. The apparatus can assign a tonality value associated with thesum of sentiment assigned to each word in the data and the value can bein a range from any negative integer (e.g., −1) to any positive integer(e.g., +1) where 0 can be reserved for NULL occurrences.

The technology allows for processing large volumes of social media datafrom various sources without the need to store the data first in arelational database (or some form of computer-readable storage media)prior to automatically analyzing the social media data for sentiment andproviding tonality scoring. This allows for very fast and dynamichandling of large data volumes to modify the services required to handlethe volume.

In addition, the technology enables for the rapid analysis of the socialmedia data providing near real time analysis and reporting of socialmedia events as they occur. The technology also provides resilience byusing multiple configurations of servers and services, the number ofwhich is determined by the volume of social media data arriving in aparticular system.

FIG. 1 shows a non-limiting, example embodiment of a block diagram of asocial intelligence apparatus. In FIG. 1 , a social intelligenceapparatus 100 receives streamed social media data from social mediastreams 200 a-200 n over a network. The social media streams 200 a-200 ncan come from a variety of sources. For example, social media streams200 a-n can be streamed from social media web-sites, such as Twitter®,Facebook®, or LinkedIn®. The social media streams 200 a-n can also comefrom blog entries or web-sites. The data provided by the streams 200 a-nis not limited to web-sites and can be data transmitted from mediaoutlets, such as publishers and/or news companies (e.g., Fox®, CNN®).

The streams can be any continuously broadcast/Internet channel forsocial media whereby items are continually added in an unrestrictedmanner (e.g., there are no cyclical start and endpoints). The dataincludes but is also not limited to data that is traditionally submittedelectronically and can include print data, such as a magazine ornewspaper, for example. Of course, one skilled in the art appreciatesthat in an example embodiment, the print data is converted to anelectronic form and transmitted as text data via the social mediastreams 200 a-n. The social media streams 200 a-n are provided by one ormore information processing devices capable of transmitting data to thesocial intelligence apparatus 100.

The social intelligence apparatus 100 can be configured to have at leastone CPU 101, at least one memory 102, and at least one data transceiverdevice (DTD) 103. The social intelligence apparatus 100 is alsoconfigured to have one or more social media data queues (SMQ) 104, atleast one social media data analyzer (SMA) 105, and at least onedatabase (DB) 106 configured to store analyzed social media data. In anexample embodiment, the social media data streamed from the streams 200a-n is processed by the social intelligence apparatus 100 using thequeues 104, analyzed and scored using the analyzer 105, and then storedin the database 106 after. The data streamed from the social media datastreams 200 a-n can be received by the social intelligence apparatus 100using the DTD 103.

FIG. 2 shows a non-limiting, example embodiment of a block diagramshowing further detail of the social intelligence apparatus 100communicating with the social media data streams 200 a-n. As can be seenin FIG. 2 , the streams 200 a-n have respective social media data 201a-n. The social media data 201 a-n can be any form of information, butin an example embodiment, is normally data taken from different socialmedia sources. For example, the social media data 201 a may be variousposts on Facebook®, the social media data 201 b may be various posts onTwitter®, and the social media data 201 c may be articles taken from theNew York Times®.

As the data is streamed in from the streams 200 a-n, the socialintelligence apparatus 100 queues the data using the social media dataqueues 104 a-n. Although FIG. 2 shows queues 104 a-n corresponding tothe streams 200 a-n, the queues 104 a-n may receive and queue data fromany of the streams 200 a-n.

Each SMS 200 a-n can be configured to have a CPU 202 a-n, memory 203a-n, and DTD 204 a-n. It should be appreciated that social media data201 a-n can be stored in the memory 203 a-n of each respective SMS 200a-n and social media data 201 a-n can be transmitted to one or moresocial intelligence apparatus 100 using DTDs 204 a-n.

In an example embodiment, social media data 201 a may contain datastreams originating from user posts on Facebook®. For example, in agiven period of time (e.g., 1 hour), Facebook® may generate over 100,000user posts. These posts can be streamed from Facebook® from social mediadata stream 200 a over the network, where the apparatus 100 receives thedata and begins storing the data in the social media queues 104 a-n.This architecture advantageously enables the apparatus 100 to processand analyze large volumes of data over the network without having tofirst store the data in a relational database.

FIG. 3 shows a block diagram of a non-limiting, example embodiment ofthe relationship between the components of the apparatus 100. In anexample embodiment, data is processed from the social media data queues104 a-n via the social media data analyzers 105 a-n. The social mediadata analyzers 105 a-n analyze the social media data for sentiment andprovide a score for each data segment. For example, an individual maywrite a post on Facebook® about how much he/she likes the design of theApple iPad 3®. The data from the post is queued in the social mediaqueue 104 a and submitted to analyzer 105 a. As the sentiment for thepost is generally positive, the analyzer 105 a will give it a positivescore.

The score may be a numerical value (e.g., a positive number for apositive sentiment, zero for neutral sentiment, and a negative numberfor negative sentiment) or could even be a description such as“positive,” “neutral,” or “negative.” In this example, the sentiment maybe scored from a different viewpoint (e.g., from the viewpoint of acompetitor) and as such, may be a negative score as opposed to apositive score. That is, a competitor to Apple® may not benefit frompositive sentiment about a particular Apple® product, so the analyzer105 a may instead score the data as negative instead of positive.

Once the analyzers 105 a-n analyze a particular piece of data, the datais then moved into databases 106 a-n. Databases 106 a-n may be separatedatabases for storing scored data or a single database storing theentire scored data. The databases 106 a-n store both the social mediadata 106 a-1-106 n-1 and the respective social media data score 106a-2-106 n-2 for each social media data 106 a-1-106-n-1. By providing apipeline for queuing, scoring, and then storing the data, the system canprocess and analyze very large volumes of streamed data prior to thedata being stored in a relational database.

FIG. 4 shows a diagram of a non-limiting, example embodiment of a socialintelligence system. FIG. 4 depicts a pipeline approach to processingdata streamed from various sources. As can be seen in FIG. 4 , data isstreamed over the Internet from various media sources, including, butnot limited to, Moreover News, Twitter, and Shadow TV. The data isstreamed into Streaming Services Srv1-n where streaming servers can pulldata from the different sources.

The Streaming Services Srv1-n stream the data to Message Queues MSMQ1-n.The data can then be retrieved from Message Queues MSMQ1-n by AnalysisServices Srv1-n. As mentioned above, Analysis Services Srv1-n willanalyze data retrieved from Queues MSMQ1-n for sentiment and score theparticular data (e.g., via a general social media analyzer). After thedata is analyzed, it can be moved into Analysis Databases DB Srv1-n. Asalso mentioned above, the Databases DB Srv1-n can store both the dataand its associated score.

After Analysis Services Srv1-n analyze the data, the data can instead bestored in Historic Databases DB Srv1-n. As described in further detailbelow, certain data may not be particularly relevant for storage in theAnalysis Databases DB Srv1-n and instead can be analyzed and scored andstored in a separate Historic Database DB Srv1-n. For example, data maybe created prior to a certain date or may be stale (e.g., a usercommenting on Facebook® about a video game system no longer in themarketplace). This data can still be analyzed and scored but insteadstored in the Historic Databases DB Srv1-n.

Upon storing the data in the Analysis Databases DB Srv1-n, the data canbe stored in Client Specific Databases DB1-n using Sentiment ServicesSrv1-n. Along with the Client Specific Databases DB1-n is a separateConfiguration Database Cfg DB where different client specific databasescan be configured. It should be appreciated that a sentiment controllercan pull the data from an analysis database to determine the relevanceof the data to a client of the system. The sentiment controller willthen move the data to the client database. The Configuration DatabaseCfg DB can then create/modify client database configurations that candetermine, for example, when a user logs onto the system and/or whichdatabase they have access.

The system can also provide a User Interface UI Srv1-n where a user cancustomize how information is presented to them. It should be appreciatedthat the interface may be a web-based interface or can even be a desktopclient interface. The User Interface Srv1-n are configured to allow auser to see, for example, a particular sentiment on a topic. Forexample, marketing personnel at Sony® may be interested in finding outwhat people and/or media sources are saying about the Sony PlayStation3®. The user can access the interface and a report will be provided tothe user showing the overall sentiment for the PlayStation 3.

For example, users on Facebook® and Twitter® may be posting positivecomments about games and features for the PlayStation 3 and the reportwill indicate the overall positive sentiment to the user using the UserInterface UI Srv1-n. It should also be appreciate that a user may directvarious information to the Historic Databases DB Srv1-n if the userdeems that certain information may not be particularly relevant to theuser. Using the example above, the user may receive informationregarding the very first Sony Playstation® and may not want thatinformation for sentiment analysis. Thus, the user can direct suchinformation to the Historic Databases DB Srv1-n so it may be retainedbut not used in the report.

FIG. 5 depicts an application flowchart of a non-limiting, exampleembodiment of a flow of processes that can be implemented by the socialintelligence apparatus 100. The process begins in S5-1 where socialmedia data is streamed from one or more social media data sources. Asdescribed above, the social media data may come from various sourcessuch as Facebook®, web-site blogs, or even newspapers and magazines.

Upon receiving the streamed social media data, the streamed data isqueued into one or more data queues (S5-2). As explained above, thequeues are capable of queuing the streamed social media data and canhandle large volumes of data. After the data is stored in the one ormore data queues the data can be retrieved from the queues andanalyzed/scored based on a particular sentiment of the data (S5-3).

After the data is analyzed/scored for a particular sentiment, the dataand its associated score can be stored in one or more analysis databases(S5-4). From there, it can be determined if the data should be used fora client specific analysis (S5-5). If so, a separate client specificdatabase can be created where the stored and analyzed data can beseparately stored (S5-6). If a database already exists for a particularclient, the data can be added to the particular database. If the data isnot meant to be stored in a client specific database, it can then bedetermined if the data should be stored in the historic databases(S5-7). If so, the data can be moved from the client specific databaseor moved from the analysis database to the historic database (S5-8).Examples of criteria for determining if data should be stored in thehistoric database are discussed with respect to FIG. 6 .

FIG. 6 shows an application flowchart of a non-limiting, exampleembodiment of a flow of processes for determining whether certain datashould be stored in the historic databases. In one example, the data maybe generated/streamed where the origin of the data may be prior to aspecific date (S6-1). For example, the streamed data may contain an oldblog generated several years ago and may not be as relevant foranalysis. If so, the data can be moved and stored in the historicdatabase (S6-6).

In another example, the data may be stored in the analysis database fora duration of time that is considered too long (S6-2). For example, theanalysis database may contain data analyzed from a Facebook® post thathas been in the analysis database for over 6 months. Of course, 6 monthsis just an example and the system can be configured to move data thathas been in the database for longer or shorter periods of time. If it isdetermined that the data is in the database for too long, the data canbe moved and stored in the historic database (S6-6).

In another example, data not relevant to a particular client may bestored in the historic database (S6-3). For example, a Facebook® postjust generally describing sentiment about a particular technology (e.g.,“I hate cell phones”) though describing a negative sentiment, may not berelevant to any particular client. Thus, such data may also be stored inthe historic database (S6-6).

In yet another example, the data may be relevant to a particular clientbut not relevant to any products of interest (S6-4). For example, a blogentry may describe how much the author likes Nintendo®. Although thesentiment is generally positive, without more, there is nothing to linkthe blog entry to one or more products for Nintendo. Thus, such data maybe advantageous for Nintendo's purposes but not necessary for anyparticular analysis and can be stored in the historic database (S6-6).

In yet another example, a user may simply intentionally store particulardata in the historic database (S6-5). As discussed above, the user maysee the relevant data via a user interface and may determine thatcertain data is not relevant for analysis thus intentionally moving suchdata to the historic database (S6-6).

As mentioned above, the technology described in this application takeslarge volumes of social media data in real-time (or near real-time) froma social media source (e.g., through the Internet and/or socialstreams). The data is placed into message queues for handling largevolumes of data in a fast and efficient manner to continue to providenear real-time data collection and resilience to failure. The data canthen be sentiment analyzed using NLP technology direct from the messagequeues while maintaining performance and near real-time processing. Thedata can then be stored in an analysis database for post-processing andthat data can be pulled from the database using sentiment controllers todetermine the relevance to a particular client. From there, the data canbe stored for the particular client in the system to ensure they onlyobtain relevant data and a user interface can then allow the client toperform their own analysis on the data stored in the client specificdatabase.

While the technology has been described in connection with exampleembodiments, it is to be understood that the technology is not to belimited to the disclosed embodiments. On the contrary, the technologycovers various modifications and equivalent arrangements included withinthe spirit and scope of the appended claims.

1. A method for analyzing social media data in an information processingapparatus, the information processing apparatus having one or moreprocessors, the method comprising: receiving, using a data transmissiondevice, social media data streams from one or more social media sourcesas social media data segments; queuing the received social media datasegments into one or more social media message queues, the one or moresocial media message queues buffering the social media data segments asthe segments are streamed from the one or more social media sources;scoring the buffered social media data segments based upon one or morepredefined factors; performing, using the one or more processors, asentiment analysis for an entity using the social media data segmentsand a score associated with the social media data segments; and storingthe social media data segments and their associated score of the datasegment into an analysis database.